A Commuter’s Challenge in Suburban Philadelphia

Every morning, Midge, a recent graduate working in downtown Philadelphia, starts her day battling the limitations of suburban bus transit. With services far less frequent than those in urban centers, her story illustrates the broader struggles faced by many suburban commuters, dependent on a system that doesn’t meet their needs.

Urban Efficiency vs. Suburban Scarcity

Using the latest SEPTA General Transit Feed Specification (GTFS) data1, it reveals that a staggering 80% of SEPTA’s trips are made by bus, showcasing the critical role of bus transit in Philadelphia’s transportation network. This statistic underlines the heavy reliance on buses across the city, not just in suburban areas.

# Import GTFS data
stops <- read.csv("data/google_bus/stops.txt", header = TRUE, sep = ",")
stop_times <- read.csv("data/google_bus/stop_times.txt", header = TRUE, sep = ",")
trips <- read.csv("data/google_bus/trips.txt", header = TRUE, sep = ",")
shapes <- read.csv("data/google_bus/shapes.txt", header = TRUE, sep = ",")
routes <- read.csv("data/google_bus/routes.txt", header = TRUE, sep = ",")
calendar <- read.csv("data/google_bus/calendar.txt", header = TRUE, sep = ",")
calendar_dates <- read.csv("data/google_bus/calendar_dates.txt", header = TRUE, sep = ",")

# Import Swiftly data
swiftly <- st_read("data/SwiftlyShapes/geoJSONs/SEPTASurface_HighResSpeed_v2.json")
# Join stop_times and trips, from now on dat will be our main data set
dat <- left_join(stop_times, trips, by = "trip_id")

# join dat and routes
dat <- left_join(dat, routes, by = "route_id")

#determine route_type
dat <- dat %>%
  mutate(mode = if_else(route_type == 0, "Trolley", ifelse(route_type == 1, "Metro", ifelse(route_type == 3, "Bus", "Trolleybus"))))

# Mode share (not ridership)
dat_trip_by_mode <- dat %>%
  group_by(mode) %>%
  summarise(mode_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(mode_count),
         mode_share = round(mode_count/sum(mode_count)*100, digits = 2)) %>%
  ungroup() 
  
# Plot mode share using ggplot2 and the custom palette
ggplot(dat_trip_by_mode, aes(x = mode, y = mode_share, fill = mode)) +
  geom_bar(stat = "identity", width = 0.75) +
  scale_fill_manual(values = bus_revolution_palette(length(unique(dat_trip_by_mode$mode)))) +
  geom_text(aes(label = paste0(mode_share, "% (", mode_count, " trips)")), 
            vjust = -0.5, color = "black", size = 2.5) +
  labs(title = "Mode Share by Transit Type",
       x = "Mode",
       y = "Mode Share (%)",
       fill = "Mode") +
  theme_minimal() +
  theme(legend.position = "right",
        plot.title = element_text(hjust = 0.5))

# total trip bus & trolley bus 1.980.505, total observations 2.026.677

The data also highlights distinct patterns in ridership, with bus usage peaking during morning (6-9AM) and evening (3-6PM) rush hours on weekdays. This contrasts with weekends, where the pattern flattens, indicating a more even distribution of trips throughout the day.

# CREATE LINE GRAPH OF TRIPS BY SERVICE PERIODS

# Function to normalize times greater than 24:00:00. Important step: check the stop_times$arrival_time and departure_time values if it's greater than 24:00 
normalize_time <- function(time_str) {
  time_parts <- strsplit(time_str, ":")[[1]]
  hours <- as.numeric(time_parts[1])
  minutes <- as.numeric(time_parts[2])
  seconds <- as.numeric(time_parts[3])
  
  if (hours >= 24) {
    hours <- hours - 24
    time_str <- sprintf("%02d:%02d:%02d", hours, minutes, seconds)
  }
  
  return(time_str)
}

#identify peak_pm category 
dat <- dat %>%
  mutate(departure_time = sapply(departure_time, normalize_time), # departure_time's normalized
         departure_time = lubridate::hms(departure_time),
         hour = hour(departure_time),  # Extract hour component for easier condition checking
         peak_category = if_else(hour >= 4 & hour < 6, "Early Morning",
                                    if_else(hour >= 6 & hour < 9, "AM Peak",
                                            if_else(hour >= 9 & hour < 15, "Mid Day",
                                                    if_else(hour >= 15 & hour < 18, "PM Peak",
                                                            if_else(hour >= 18 & hour < 22, "Evening", 
                                                                    ifelse(hour >= 22 & hour < 24, "Late Night", "Owl"))))))) 

# # Check if there are any NA values in the 'departure_time' column
# summarise(dat,
#     na_count = sum(is.na(departure_time)), #Count NA values
#     total_rows = n(),  # Total number of rows for context
#     percentage_na = na_count / total_rows * 100  # Percentage of NA values
#   )

durations <- c("Owl" = 4, "Early Morning" = 2, "AM Peak" = 3, "Mid Day" = 6, "PM Peak" = 3, "Evening" = 4, "Late Night" = 2)

dat_service_period <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(peak_category)%>%
  summarise(peak_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(peak_count),
         peak_share = round(peak_count/sum(peak_count)*100, digits = 2)) %>%
  arrange(factor(peak_category, levels = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"))) %>%
  ungroup()

# Create a data frame for service periods
service_periods <- data.frame(
  peak_category = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"),
  start_hour = c(0, 4, 6, 9, 15, 18, 22),
  end_hour = c(4, 6, 9, 15, 18, 22, 24)
)

# CREATE BAR CHART OF TRIPS BY SERVICE TYPES

# Ensure 'calendar' has 'day_type' defined
calendar <- calendar %>%
  mutate(
    day_type = case_when(
      monday == 1 & tuesday == 1 & wednesday == 1 & thursday == 1 & friday == 1 & saturday == 0 & sunday == 0 ~ "Weekday",
      (saturday == 1 | sunday == 1) ~ "Weekend",
      TRUE ~ "irregular"
    )
  )

# Calendar_dates includes 'service_id', 'date', and 'exception_type'
# Ensure 'calendar_dates' has 'day_type' defined based on the date
calendar_dates <- calendar_dates %>%
  filter(exception_type == 1) %>% #we only need exception_type == 1, those added to the service
  mutate(
    date = ymd(date),  # Convert date format if necessary
    day_of_week = wday(date, label = TRUE, week_start = 1),
    day_type = case_when(
      day_of_week %in% c("Mon", "Tue", "Wed", "Thu", "Fri") ~ "Weekday",  # Monday to Friday
      TRUE ~ "Weekend"  # Saturday and Sunday
    )
  )

# Join calendar_dates with calendar
calendar_joined <- calendar_dates %>%
  full_join(calendar, by = "service_id", suffix = c(".dates", ".cal")) %>%
  mutate(
    final_day_type = case_when(
      # Use calendar_dates' day_type if exception_type indicates added service
      exception_type == 1 & day_type.dates == "Weekday" ~ "Weekday",
      exception_type == 1 & day_type.dates == "Weekend" ~ "Weekend",
      # Default to calendar's day_type where there's no specific exception noted
      TRUE ~ day_type.cal
    )
  ) %>%
  select(service_id, final_day_type) %>%
  group_by(service_id, final_day_type) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(service_id, desc(count)) %>%
  distinct(service_id, .keep_all = TRUE) %>% # Make sure no double day types for service ID
  select(-count)
  
dat <- dat %>%
  left_join(calendar_joined, by = "service_id")

dat_service_type <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(final_day_type)%>%
  summarise(day_type_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(day_type_count),
         peak_share = round(day_type_count/sum(day_type_count)*100, digits = 2)) %>%
  arrange(factor(final_day_type, levels = c("Weekday", "Weekend"))) %>%
  ungroup()

# Summarize data to get the number of trips per hour
hourly_trips <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(hour, final_day_type) %>%
  summarise(number_of_trips = n())

# Define a custom palette for the service periods
service_period_colors <- c(
  "Owl" = "#E5E5E5",
  "Early Morning" = "#C5E0B4",
  "AM Peak" = "#A9D08E",
  "Mid Day" = "#FFD966",
  "PM Peak" = "#F4B084",
  "Evening" = "#D5A6BD",
  "Late Night" = "#A4C2F4"
)

# Plot the combined graph
ggplot() +
  # Add service period background rectangles
  geom_rect(data = service_periods, aes(xmin = start_hour, xmax = end_hour, ymin = 0, ymax = max(hourly_trips$number_of_trips), fill = peak_category), alpha = 0.3) +
  # Add the line plot for weekday trips
  geom_line(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#F2AE2E", size = 1) +
  geom_point(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 2) +
  # Add the line plot for weekend trips
  geom_line(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#A6123A", size = 1) +
  geom_point(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 2) +
  # Define scales and labels
  scale_x_continuous(breaks = 0:23) +
  scale_y_continuous(labels = comma) +
  scale_fill_manual(values = service_period_colors) +
  scale_color_manual(values = c("Weekday" = "#F2AE2E", 
                                "Weekend" = "#A6123A")) +
  labs(title = "Number of Trips by Hour with Service Periods",
       x = "Hour of Day",
       y = "Number of Trips",
       fill = "Service Period",
       color = "Day Type") +
  theme_minimal() +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5))

While bus services are integral to Philadelphia’s transit system, not all areas benefit equally. Data specifically pertaining to suburban regions during weekday AM and PM peak hours reveals a stark disparity: most suburban bus stations face waiting times ranging from 30 to 60 minutes between buses. This significant wait time not only underscores the challenges faced by commuters like Midge but also highlights a broader issue affecting many suburban residents across Southeastern Pennsylvania, who rely heavily on bus transit.

# SET UP STOPS CATEGORY. This step is crucial as we'll use stops information and join them for each kind of maps 

# Convert stops data to sf object
stops <- stops %>%
  st_as_sf(coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# Load PA and SEPTA region map for base map
PA <- st_read("data/PaCounty2024_05.geojson")
SEPTA_region_map <- PA[PA$COUNTY_NAM %in% c("MONTGOMERY", "BUCKS", "PHILADELPHIA", "DELAWARE", "CHESTER"), ] %>%
  select(PA_CTY_COD, COUNTY_NAM, FIPS_COUNT)

# Load Center City boundary
center_city <- st_read("data/20240708_CenterCity/CenterCity.shp")

# Ensure all datasets are in the same CRS
SEPTA_region_map <- st_transform(SEPTA_region_map, crs = st_crs(stops))
center_city <- st_transform(center_city, crs = st_crs(stops))

# Perform the spatial intersection for county
stops <- stops %>%
  st_join(SEPTA_region_map, left = TRUE, join = st_intersects) %>%
  rename(county = COUNTY_NAM) %>%
  mutate(center_city = st_within(stops, center_city, sparse = FALSE))%>%
  mutate(center_city = rowSums(center_city) > 0)  # Convert logical matrix to vector

# Categorize the stops
stops <- stops %>%
  mutate(category = case_when(
    county == "PHILADELPHIA" & center_city == TRUE ~ "PHILADELPHIA-Center_City",
    county == "PHILADELPHIA" & center_city == FALSE ~ "PHILADELPHIA-Not_Center_City",
    TRUE ~ county
  )) %>%
  select(-"center_city")
# SET UP HOURLY AVERAGE TRIP DATA FOR EACH BUS STOP

dat_trip_count_hr <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(avg_trip_hr = round(trip_count/24)) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_hr <- left_join(dat_trip_count_hr, stops, by = "stop_id")

dat_trip_count_hr <- st_as_sf(dat_trip_count_hr, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# sum(dat_trip_count_hr$trip_count) #1980505 <- Always make sure you retain the number of observation
# SET UP AVERAGE TRIP DATA PER SERVICE PERIOD FOR EACH BUS STOP

dat_trip_count_prd <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, peak_category, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(
    hours = durations[peak_category],
    avg_trip_hr = round(trip_count / hours)
  ) %>%
  select(-hours) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_prd <- left_join(dat_trip_count_prd, stops, by = "stop_id")

dat_trip_count_prd <- st_as_sf(dat_trip_count_prd, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# sum(dat_trip_count_prd$trip_count)  #1980505 <- Always make sure you retain the number of observation
SEPTA_region_map <- st_transform(SEPTA_region_map, st_crs(dat_trip_count_hr))

# # Define color palette
# pal <- colorNumeric(palette = viridisLite::viridis(256, option = "C"), domain = dat_trip_count_hr$avg_trip_hr)

# Define the color codes for bus revolution and reverse them
bus_revolution_colors <- rev(c("#A6123A", "#655BA6", "#26A699", "#F2AE2E", "#F2E5D5"))

# Ensure max_freq is a factor with levels matching the colors
dat_trip_count_hr$max_freq <- factor(dat_trip_count_hr$max_freq, levels = c("5 MAX", "10 MAX", "15 MAX", "30 MAX", "60 MAX"))

# Define the color palette for max_freq
max_freq_colors <- c("5 MAX" = "#4B0082", "10 MAX" = "#FF4500", "15 MAX" = "#A6123A", "30 MAX" = "#26A699", "60 MAX" = "#F2AE2E")
pal_max_freq <- colorFactor(palette = max_freq_colors, domain = dat_trip_count_hr$max_freq)

# # Generate the list for overlayGroups with the desired order
# overlay_groups <- expand.grid(final_day_type = c("Weekday", "Weekend"), peak_category = peak_order) %>%
#   mutate(group_name = paste(final_day_type, peak_category, sep = "_")) %>%
#   pull(group_name)

# Create the leaflet map
m_avg_trip_hr <- leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data = SEPTA_region_map, color = "black", weight = 1, fill = FALSE) %>%
  addCircleMarkers(data = dat_trip_count_hr %>% filter(final_day_type == "Weekday"),
                   group = "Weekday",
                   lng = ~stop_lon, lat = ~stop_lat,
                   color = ~pal_max_freq(max_freq),
                   radius = 1,
                   opacity = 0.8,
                   popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
                                   "<b>Stop Name:</b> ", stop_name, "<br>",
                                   "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
                                   "<b>Max Frequency:</b> ", max_freq, "<br>",
                                   "<b>All Routes:</b> ", All_Routes, "<br>",
                                   "<b>Day Type:</b> ", final_day_type)) %>%
  addCircleMarkers(data = dat_trip_count_hr %>% filter(final_day_type == "Weekend"),
                   group = "Weekend",
                   lng = ~stop_lon, lat = ~stop_lat,
                   color = ~pal_max_freq(max_freq),
                   radius = 1,
                   opacity = 0.8,
                   popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
                                   "<b>Stop Name:</b> ", stop_name, "<br>",
                                   "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
                                   "<b>Max Frequency:</b> ", max_freq, "<br>",
                                   "<b>All Routes:</b> ", All_Routes, "<br>",
                                   "<b>Day Type:</b> ", final_day_type)) %>%
  addLayersControl(
    overlayGroups = c("Weekday", "Weekend"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  hideGroup(c("Weekend")) %>%
  addLegend(pal = pal_max_freq, values = dat_trip_count_hr$max_freq, title = "Max Frequency",
            position = "bottomright")

m_avg_trip_hr

Further analysis of bus stop data by county—excluding Philadelphia’s Center City—reveals a misleading equality in service distribution across the counties. While the top bus stops in each county boast a 5-minute maximum headway, reflecting seemingly equal service levels, the actual number of buses tells a different story. For instance, even at their busiest, bus stops in Bucks County see an average of only 15 buses per hour at the 5 MAX category, starkly less than the 30 buses seen in similar categories in Philadelphia’s outlying areas. This discrepancy highlights the concentration of better services still within closer proximity to urban centers, despite attempts at equitable distribution.

# Filter and select the top 10 stops for 'mode' == 'bus' on the weekday
top_stops_hr_weekday <- dat_trip_count_hr %>%
  filter(mode %in% c("Bus","Trolleybus"), final_day_type == "Weekday") %>%
  select(stop_id, stop_name, avg_trip_hr, All_Routes) %>%
  arrange(desc(avg_trip_hr)) %>%
  slice_head(n = 10)

# Create a table with kable and enhance it with kableExtra
kable(top_stops_hr_weekday, "html", booktabs = TRUE, 
      caption = "Top 10 Bus Stops by Average Trips Per Hour on the Weekday",
      align = c('l', rep('l', ncol(top_stops_hr_weekday) - 1))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE, color = "#655BA6", width = "5em") %>%
  column_spec(3, bold = TRUE, color = "#A6123A", extra_css = "text-align: right;")
Top 10 Bus Stops by Average Trips Per Hour on the Weekday
stop_id stop_name avg_trip_hr All_Routes geometry
21204 Frankford Transportation Center - Main Dropoff 47 14, 19, 20, 24, 26, 3, 50, 58, 67, 88, BLVDDIR, MFO POINT (-75.07748 40.02329)
1148 69th St Transportation Center South Terminal 44 103, 104, 105, 107, 108, 109, 111, 113, 110, 120, 21, 30, 65, 68, MFO POINT (-75.25828 39.96208)
10266 Market St & 15th St 40 124, 125, 17, 31, 32, 33, 38, 44, 48, 62, BSO, MFO POINT (-75.16548 39.95255)
10272 Market St & 18th St 36 124, 125, 17, 31, 32, 33, 38, 44, 48, 62 POINT (-75.17017 39.95312)
18451 Market St & 16th St 36 124, 125, 17, 31, 32, 33, 38, 44, 48, 62 POINT (-75.16705 39.95274)
136 Cheltenham Av & Ogontz Av Loop 34 16, 6, H, XH POINT (-75.1591 40.07483)
17842 JFK Blvd & 15th St 34 MANNLP, 124, 125, 17, 27, 31, 32, 33, 38, 44, 446, 62, 78 POINT (-75.16476 39.95365)
8936 JFK Blvd & 17th St 31 MANNLP, 124, 125, 17, 31, 32, 33, 38, 44, 446, 62 POINT (-75.16806 39.95406)
101 Pier 70 WalMart & Home Depot 29 25, 29, 64, 7 POINT (-75.14167 39.92531)
15497 69th St Transit Center 29 104, 109, 110, 111, 112, 120, 123, 126 POINT (-75.25968 39.96217)
# Define a custom palette for the categories
category_colors <- c(
  "BUCKS" = "#A9D08E",
  "CHESTER" = "#FFD966",
  "DELAWARE" = "#F4B084",
  "MONTGOMERY" = "#D5A6BD",
  "PHILADELPHIA-Not_Center_City" = "#A4C2F4"
)

# Example with Weekday data
dat_filtered_weekday <- dat_trip_count_hr %>%
  filter(mode %in% c("Bus","Trolleybus"), final_day_type == "Weekday" & category != "PHILADELPHIA-Center_City")

# Function to get top 5 stops per category
get_top_stops_per_category <- function(data, n = 5) {
  data %>%
    group_by(category) %>%
    select(stop_id, stop_name, avg_trip_hr, max_freq, All_Routes, category) %>%
    arrange(desc(avg_trip_hr)) %>%
    slice_head(n = n) %>%
    ungroup()
}

# Get top 5 stops per category for Weekday
top_stops_hr_weekday <- get_top_stops_per_category(dat_filtered_weekday, n = 5)

# Create the table for Weekday with row colors
kable(top_stops_hr_weekday, "html", booktabs = TRUE, 
      caption = "Top 5 Bus Stops by Average Trips Per Hour on the Weekday",
      align = c('l', rep('l', ncol(top_stops_hr_weekday) - 1))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE, color = "#655BA6", width = "5em") %>%
  column_spec(3, bold = TRUE, color = "#A6123A", extra_css = "text-align: right;") 
Top 5 Bus Stops by Average Trips Per Hour on the Weekday
stop_id stop_name avg_trip_hr max_freq All_Routes category geometry
330 Neshaminy Mall 16 5 MAX 128, 130, 14, 58, BLVDDIR BUCKS POINT (-74.95357 40.13806)
23080 Rockhill Rd & Neshaminy Rd - MBFS 14 5 MAX 128, 130, 14, 58, BLVDDIR BUCKS POINT (-74.95696 40.13665)
23071 Rockhill Rd & Neshaminy Blvd - MBNS 13 5 MAX 14, 58, BLVDDIR BUCKS POINT (-74.95725 40.13658)
22762 Roosevelt Blvd & Old Lincoln Hwy - MBFS 12 5 MAX 1, 14, BLVDDIR BUCKS POINT (-74.97913 40.12061)
25881 Roosevelt Blvd & Old Lincoln Hwy - MBNS 12 5 MAX 1, 14, BLVDDIR BUCKS POINT (-74.97909 40.12022)
1990 West Chester Transportation Center 8 10 MAX 135, 92, 104 CHESTER POINT (-75.6073 39.9579)
18729 Chestnut St & Church St 6 10 MAX 135, 92, 104 CHESTER POINT (-75.60707 39.96102)
18730 Chestnut St & Darlington St 6 10 MAX 135, 92, 104 CHESTER POINT (-75.60838 39.96047)
30146 New St & Chestnut St - FS 6 10 MAX 135, 92, 104 CHESTER POINT (-75.60984 39.95955)
14820 Church St & University Av - FS 5 15 MAX 104 CHESTER POINT (-75.6 39.95252)
1148 69th St Transportation Center South Terminal 44 5 MAX 103, 104, 105, 107, 108, 109, 111, 113, 110, 120, 21, 30, 65, 68, MFO DELAWARE POINT (-75.25828 39.96208)
15497 69th St Transit Center 29 5 MAX 104, 109, 110, 111, 112, 120, 123, 126 DELAWARE POINT (-75.25968 39.96217)
2024 69th St Transportation Center North Terminal 25 5 MAX 103, 105, 106, 123, 30, 65 DELAWARE POINT (-75.25861 39.96282)
14785 Chester Transportation Center 20 5 MAX 117, 119, 113, 37 DELAWARE POINT (-75.35997 39.84921)
30597 Darby Transportation Center - bus 17 5 MAX 114, 115, 113 DELAWARE POINT (-75.26358 39.9191)
136 Cheltenham Av & Ogontz Av Loop 34 5 MAX 16, 6, H, XH MONTGOMERY POINT (-75.1591 40.07483)
1649 Norristown Transportation Center 22 5 MAX 131, 90, 93, 96, 97, 98, 99 MONTGOMERY POINT (-75.34482 40.11345)
809 Willow Grove Park Mall 20 5 MAX 310, 311, 95, 22, 55 MONTGOMERY POINT (-75.12249 40.14226)
1827 Plaza at King of Prussia 19 5 MAX 124, 139, 92, 99, 123, 125 MONTGOMERY POINT (-75.39311 40.08983)
246 Plymouth Meeting Mall 16 5 MAX 150, 90, 95, 98, 27, L MONTGOMERY POINT (-75.28403 40.1172)
21204 Frankford Transportation Center - Main Dropoff 47 5 MAX 14, 19, 20, 24, 26, 3, 50, 58, 67, 88, BLVDDIR, MFO PHILADELPHIA-Not_Center_City POINT (-75.07748 40.02329)
101 Pier 70 WalMart & Home Depot 29 5 MAX 25, 29, 64, 7 PHILADELPHIA-Not_Center_City POINT (-75.14167 39.92531)
28 Wissahickon Transportation Center - onsite 28 5 MAX 124, 125, 1, 35, 38, 61, R PHILADELPHIA-Not_Center_City POINT (-75.20724 40.0149)
8712 Hunting Park Av & 22nd St 27 5 MAX 33, 445, 447, 56, H, R, XH PHILADELPHIA-Not_Center_City POINT (-75.16457 40.01089)
410 Philadelphia Mills & Marshalls 24 5 MAX 129, 130, 20, 50, 67, 84 PHILADELPHIA-Not_Center_City POINT (-74.96447 40.08583)

Impact and Innovation: Enhancing Suburban Bus Services

While the raw data illuminates the service disparities, the true depth of this issue is best captured through the lived experiences of suburban commuters like Midge. Personal stories reveal the broader human impact: missed opportunities, prolonged workdays, and diminished family life due to infrequent bus services. This qualitative insight paints a vivid picture of the daily challenges and underscores the urgency for targeted improvements.

SEPTA’s ongoing “Bus Revolution” project aims to modernize and enhance bus services, promising updated routes and improved frequencies that haven’t been revised since the 1960s. While the initiative seeks to reduce headways to a maximum of 30 minutes even in the most remote suburban areas, it has faced delays and criticism, particularly concerning its potential to disproportionately impact low-income neighborhoods. This tension highlights the complex interplay between improving service efficiency and addressing community concerns of gentrification.

As we conclude our exploration of suburban bus transit disparities in Philadelphia, the compelling narratives of daily commuters, underpinned by robust SEPTA data, paint a clear picture of the challenges and the urgent need for systemic improvements. The “Bus Revolution” project, while a promising step toward enhancing bus frequencies and updating decades-old routes, also highlights the delicate balance required to improve service efficiency without exacerbating social inequities.

Embracing innovative solutions and fostering community engagement in transit planning are crucial. By committing to both technological upgrades and equitable service distribution, SEPTA can ensure that every commuter, regardless of where they live, has reliable and timely access to public transit. This not only improves individual daily experiences but also supports broader goals of economic mobility and sustainable urban development.

As stakeholders, we are called upon to support these initiatives, advocate for transparent and inclusive planning processes, and ensure that the voices of all commuters, especially those most affected like Midge, are heard and addressed in the pursuit of a truly connected and equitable transit system.

---
title: "Waiting for Change: The Urgent Need for Equitable Bus Services in Suburban Philadelphia"
author: "Regy Septian"
date: "2024-09-24"
output: 
  html_document:
    toc: true
    toc_float: true
    code_folding: "hide"
    code_download: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = FALSE)

library(tidyverse)
library(sf)
library(tidytransit)
library(lubridate)
library(hms)
library(tidyr)
library(patchwork)
library(kableExtra)
library(wk)
library(leaflet)
library(leaflet.extras)
library(viridis)
library(viridisLite)
library(htmlwidgets)
library(htmltools)
library(leaflet.extras)
library(forcats)
library(ggplot2)
library(scales)
library(shiny)
library(webshot2)
library(magick)

# Define the color codes
bus_revolution_colors <- c("#A6123A", "#655BA6", "#26A699", "#F2AE2E", "#F2E5D5")

# Create a custom palette function
bus_revolution_palette <- function(n) {
  colorRampPalette(bus_revolution_colors)(n)
}

workdir <- "~/GitHub/MUSA-6310_Assignment-2" # Change the directory
setwd(workdir)
```
### A Commuter's Challenge in Suburban Philadelphia  

Every morning, Midge, a recent graduate working in downtown Philadelphia, starts her day battling the limitations of suburban bus transit. With services far less frequent than those in urban centers, her story illustrates the broader struggles faced by many suburban commuters, dependent on a system that doesn't meet their needs.  

### Urban Efficiency vs. Suburban Scarcity  

Using the latest SEPTA General Transit Feed Specification (GTFS) data[^1], it reveals that a staggering 80% of SEPTA's trips are made by bus, showcasing the critical role of bus transit in Philadelphia’s transportation network. This statistic underlines the heavy reliance on buses across the city, not just in suburban areas. 

[^1]: [SEPTA GTFS](https://www3.septa.org/developer/gtfs_public.zip)


```{r import gtfs, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# Import GTFS data
stops <- read.csv("data/google_bus/stops.txt", header = TRUE, sep = ",")
stop_times <- read.csv("data/google_bus/stop_times.txt", header = TRUE, sep = ",")
trips <- read.csv("data/google_bus/trips.txt", header = TRUE, sep = ",")
shapes <- read.csv("data/google_bus/shapes.txt", header = TRUE, sep = ",")
routes <- read.csv("data/google_bus/routes.txt", header = TRUE, sep = ",")
calendar <- read.csv("data/google_bus/calendar.txt", header = TRUE, sep = ",")
calendar_dates <- read.csv("data/google_bus/calendar_dates.txt", header = TRUE, sep = ",")

# Import Swiftly data
swiftly <- st_read("data/SwiftlyShapes/geoJSONs/SEPTASurface_HighResSpeed_v2.json")

```

```{r total trip, message=FALSE, warning=FALSE, cache=TRUE}
# Join stop_times and trips, from now on dat will be our main data set
dat <- left_join(stop_times, trips, by = "trip_id")

# join dat and routes
dat <- left_join(dat, routes, by = "route_id")

#determine route_type
dat <- dat %>%
  mutate(mode = if_else(route_type == 0, "Trolley", ifelse(route_type == 1, "Metro", ifelse(route_type == 3, "Bus", "Trolleybus"))))

# Mode share (not ridership)
dat_trip_by_mode <- dat %>%
  group_by(mode) %>%
  summarise(mode_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(mode_count),
         mode_share = round(mode_count/sum(mode_count)*100, digits = 2)) %>%
  ungroup() 
  
# Plot mode share using ggplot2 and the custom palette
ggplot(dat_trip_by_mode, aes(x = mode, y = mode_share, fill = mode)) +
  geom_bar(stat = "identity", width = 0.75) +
  scale_fill_manual(values = bus_revolution_palette(length(unique(dat_trip_by_mode$mode)))) +
  geom_text(aes(label = paste0(mode_share, "% (", mode_count, " trips)")), 
            vjust = -0.5, color = "black", size = 2.5) +
  labs(title = "Mode Share by Transit Type",
       x = "Mode",
       y = "Mode Share (%)",
       fill = "Mode") +
  theme_minimal() +
  theme(legend.position = "right",
        plot.title = element_text(hjust = 0.5))

# total trip bus & trolley bus 1.980.505, total observations 2.026.677
```

The data also highlights distinct patterns in ridership, with bus usage peaking during morning (6-9AM) and evening (3-6PM) rush hours on weekdays. This contrasts with weekends, where the pattern flattens, indicating a more even distribution of trips throughout the day.
  
```{r peak time, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# CREATE LINE GRAPH OF TRIPS BY SERVICE PERIODS

# Function to normalize times greater than 24:00:00. Important step: check the stop_times$arrival_time and departure_time values if it's greater than 24:00 
normalize_time <- function(time_str) {
  time_parts <- strsplit(time_str, ":")[[1]]
  hours <- as.numeric(time_parts[1])
  minutes <- as.numeric(time_parts[2])
  seconds <- as.numeric(time_parts[3])
  
  if (hours >= 24) {
    hours <- hours - 24
    time_str <- sprintf("%02d:%02d:%02d", hours, minutes, seconds)
  }
  
  return(time_str)
}

#identify peak_pm category 
dat <- dat %>%
  mutate(departure_time = sapply(departure_time, normalize_time), # departure_time's normalized
         departure_time = lubridate::hms(departure_time),
         hour = hour(departure_time),  # Extract hour component for easier condition checking
         peak_category = if_else(hour >= 4 & hour < 6, "Early Morning",
                                    if_else(hour >= 6 & hour < 9, "AM Peak",
                                            if_else(hour >= 9 & hour < 15, "Mid Day",
                                                    if_else(hour >= 15 & hour < 18, "PM Peak",
                                                            if_else(hour >= 18 & hour < 22, "Evening", 
                                                                    ifelse(hour >= 22 & hour < 24, "Late Night", "Owl"))))))) 

# # Check if there are any NA values in the 'departure_time' column
# summarise(dat,
#     na_count = sum(is.na(departure_time)), #Count NA values
#     total_rows = n(),  # Total number of rows for context
#     percentage_na = na_count / total_rows * 100  # Percentage of NA values
#   )

durations <- c("Owl" = 4, "Early Morning" = 2, "AM Peak" = 3, "Mid Day" = 6, "PM Peak" = 3, "Evening" = 4, "Late Night" = 2)

dat_service_period <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(peak_category)%>%
  summarise(peak_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(peak_count),
         peak_share = round(peak_count/sum(peak_count)*100, digits = 2)) %>%
  arrange(factor(peak_category, levels = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"))) %>%
  ungroup()

# Create a data frame for service periods
service_periods <- data.frame(
  peak_category = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"),
  start_hour = c(0, 4, 6, 9, 15, 18, 22),
  end_hour = c(4, 6, 9, 15, 18, 22, 24)
)

# CREATE BAR CHART OF TRIPS BY SERVICE TYPES

# Ensure 'calendar' has 'day_type' defined
calendar <- calendar %>%
  mutate(
    day_type = case_when(
      monday == 1 & tuesday == 1 & wednesday == 1 & thursday == 1 & friday == 1 & saturday == 0 & sunday == 0 ~ "Weekday",
      (saturday == 1 | sunday == 1) ~ "Weekend",
      TRUE ~ "irregular"
    )
  )

# Calendar_dates includes 'service_id', 'date', and 'exception_type'
# Ensure 'calendar_dates' has 'day_type' defined based on the date
calendar_dates <- calendar_dates %>%
  filter(exception_type == 1) %>% #we only need exception_type == 1, those added to the service
  mutate(
    date = ymd(date),  # Convert date format if necessary
    day_of_week = wday(date, label = TRUE, week_start = 1),
    day_type = case_when(
      day_of_week %in% c("Mon", "Tue", "Wed", "Thu", "Fri") ~ "Weekday",  # Monday to Friday
      TRUE ~ "Weekend"  # Saturday and Sunday
    )
  )

# Join calendar_dates with calendar
calendar_joined <- calendar_dates %>%
  full_join(calendar, by = "service_id", suffix = c(".dates", ".cal")) %>%
  mutate(
    final_day_type = case_when(
      # Use calendar_dates' day_type if exception_type indicates added service
      exception_type == 1 & day_type.dates == "Weekday" ~ "Weekday",
      exception_type == 1 & day_type.dates == "Weekend" ~ "Weekend",
      # Default to calendar's day_type where there's no specific exception noted
      TRUE ~ day_type.cal
    )
  ) %>%
  select(service_id, final_day_type) %>%
  group_by(service_id, final_day_type) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(service_id, desc(count)) %>%
  distinct(service_id, .keep_all = TRUE) %>% # Make sure no double day types for service ID
  select(-count)
  
dat <- dat %>%
  left_join(calendar_joined, by = "service_id")

dat_service_type <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(final_day_type)%>%
  summarise(day_type_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(day_type_count),
         peak_share = round(day_type_count/sum(day_type_count)*100, digits = 2)) %>%
  arrange(factor(final_day_type, levels = c("Weekday", "Weekend"))) %>%
  ungroup()

# Summarize data to get the number of trips per hour
hourly_trips <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(hour, final_day_type) %>%
  summarise(number_of_trips = n())

# Define a custom palette for the service periods
service_period_colors <- c(
  "Owl" = "#E5E5E5",
  "Early Morning" = "#C5E0B4",
  "AM Peak" = "#A9D08E",
  "Mid Day" = "#FFD966",
  "PM Peak" = "#F4B084",
  "Evening" = "#D5A6BD",
  "Late Night" = "#A4C2F4"
)

# Plot the combined graph
ggplot() +
  # Add service period background rectangles
  geom_rect(data = service_periods, aes(xmin = start_hour, xmax = end_hour, ymin = 0, ymax = max(hourly_trips$number_of_trips), fill = peak_category), alpha = 0.3) +
  # Add the line plot for weekday trips
  geom_line(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#F2AE2E", size = 1) +
  geom_point(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 2) +
  # Add the line plot for weekend trips
  geom_line(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#A6123A", size = 1) +
  geom_point(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 2) +
  # Define scales and labels
  scale_x_continuous(breaks = 0:23) +
  scale_y_continuous(labels = comma) +
  scale_fill_manual(values = service_period_colors) +
  scale_color_manual(values = c("Weekday" = "#F2AE2E", 
                                "Weekend" = "#A6123A")) +
  labs(title = "Number of Trips by Hour with Service Periods",
       x = "Hour of Day",
       y = "Number of Trips",
       fill = "Service Period",
       color = "Day Type") +
  theme_minimal() +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5))
  
```

While bus services are integral to Philadelphia's transit system, not all areas benefit equally. Data specifically pertaining to suburban regions during weekday AM and PM peak hours reveals a stark disparity: most suburban bus stations face waiting times ranging from 30 to 60 minutes between buses. This significant wait time not only underscores the challenges faced by commuters like Midge but also highlights a broader issue affecting many suburban residents across Southeastern Pennsylvania, who rely heavily on bus transit.  

```{r set up stops category, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# SET UP STOPS CATEGORY. This step is crucial as we'll use stops information and join them for each kind of maps 

# Convert stops data to sf object
stops <- stops %>%
  st_as_sf(coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# Load PA and SEPTA region map for base map
PA <- st_read("data/PaCounty2024_05.geojson")
SEPTA_region_map <- PA[PA$COUNTY_NAM %in% c("MONTGOMERY", "BUCKS", "PHILADELPHIA", "DELAWARE", "CHESTER"), ] %>%
  select(PA_CTY_COD, COUNTY_NAM, FIPS_COUNT)

# Load Center City boundary
center_city <- st_read("data/20240708_CenterCity/CenterCity.shp")

# Ensure all datasets are in the same CRS
SEPTA_region_map <- st_transform(SEPTA_region_map, crs = st_crs(stops))
center_city <- st_transform(center_city, crs = st_crs(stops))

# Perform the spatial intersection for county
stops <- stops %>%
  st_join(SEPTA_region_map, left = TRUE, join = st_intersects) %>%
  rename(county = COUNTY_NAM) %>%
  mutate(center_city = st_within(stops, center_city, sparse = FALSE))%>%
  mutate(center_city = rowSums(center_city) > 0)  # Convert logical matrix to vector

# Categorize the stops
stops <- stops %>%
  mutate(category = case_when(
    county == "PHILADELPHIA" & center_city == TRUE ~ "PHILADELPHIA-Center_City",
    county == "PHILADELPHIA" & center_city == FALSE ~ "PHILADELPHIA-Not_Center_City",
    TRUE ~ county
  )) %>%
  select(-"center_city")
  
```

```{r hourly stop data, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# SET UP HOURLY AVERAGE TRIP DATA FOR EACH BUS STOP

dat_trip_count_hr <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(avg_trip_hr = round(trip_count/24)) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_hr <- left_join(dat_trip_count_hr, stops, by = "stop_id")

dat_trip_count_hr <- st_as_sf(dat_trip_count_hr, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# sum(dat_trip_count_hr$trip_count) #1980505 <- Always make sure you retain the number of observation

```

```{r service period stop data, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# SET UP AVERAGE TRIP DATA PER SERVICE PERIOD FOR EACH BUS STOP

dat_trip_count_prd <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, peak_category, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(
    hours = durations[peak_category],
    avg_trip_hr = round(trip_count / hours)
  ) %>%
  select(-hours) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_prd <- left_join(dat_trip_count_prd, stops, by = "stop_id")

dat_trip_count_prd <- st_as_sf(dat_trip_count_prd, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# sum(dat_trip_count_prd$trip_count)  #1980505 <- Always make sure you retain the number of observation
```

```{r average bus hourly maxfreq, message=FALSE, warning=FALSE, cache=FALSE}
SEPTA_region_map <- st_transform(SEPTA_region_map, st_crs(dat_trip_count_hr))

# # Define color palette
# pal <- colorNumeric(palette = viridisLite::viridis(256, option = "C"), domain = dat_trip_count_hr$avg_trip_hr)

# Define the color codes for bus revolution and reverse them
bus_revolution_colors <- rev(c("#A6123A", "#655BA6", "#26A699", "#F2AE2E", "#F2E5D5"))

# Ensure max_freq is a factor with levels matching the colors
dat_trip_count_hr$max_freq <- factor(dat_trip_count_hr$max_freq, levels = c("5 MAX", "10 MAX", "15 MAX", "30 MAX", "60 MAX"))

# Define the color palette for max_freq
max_freq_colors <- c("5 MAX" = "#4B0082", "10 MAX" = "#FF4500", "15 MAX" = "#A6123A", "30 MAX" = "#26A699", "60 MAX" = "#F2AE2E")
pal_max_freq <- colorFactor(palette = max_freq_colors, domain = dat_trip_count_hr$max_freq)

# # Generate the list for overlayGroups with the desired order
# overlay_groups <- expand.grid(final_day_type = c("Weekday", "Weekend"), peak_category = peak_order) %>%
#   mutate(group_name = paste(final_day_type, peak_category, sep = "_")) %>%
#   pull(group_name)

# Create the leaflet map
m_avg_trip_hr <- leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data = SEPTA_region_map, color = "black", weight = 1, fill = FALSE) %>%
  addCircleMarkers(data = dat_trip_count_hr %>% filter(final_day_type == "Weekday"),
                   group = "Weekday",
                   lng = ~stop_lon, lat = ~stop_lat,
                   color = ~pal_max_freq(max_freq),
                   radius = 1,
                   opacity = 0.8,
                   popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
                                   "<b>Stop Name:</b> ", stop_name, "<br>",
                                   "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
                                   "<b>Max Frequency:</b> ", max_freq, "<br>",
                                   "<b>All Routes:</b> ", All_Routes, "<br>",
                                   "<b>Day Type:</b> ", final_day_type)) %>%
  addCircleMarkers(data = dat_trip_count_hr %>% filter(final_day_type == "Weekend"),
                   group = "Weekend",
                   lng = ~stop_lon, lat = ~stop_lat,
                   color = ~pal_max_freq(max_freq),
                   radius = 1,
                   opacity = 0.8,
                   popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
                                   "<b>Stop Name:</b> ", stop_name, "<br>",
                                   "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
                                   "<b>Max Frequency:</b> ", max_freq, "<br>",
                                   "<b>All Routes:</b> ", All_Routes, "<br>",
                                   "<b>Day Type:</b> ", final_day_type)) %>%
  addLayersControl(
    overlayGroups = c("Weekday", "Weekend"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  hideGroup(c("Weekend")) %>%
  addLegend(pal = pal_max_freq, values = dat_trip_count_hr$max_freq, title = "Max Frequency",
            position = "bottomright")

m_avg_trip_hr
```

Further analysis of bus stop data by county—excluding Philadelphia's Center City—reveals a misleading equality in service distribution across the counties. While the top bus stops in each county boast a 5-minute maximum headway, reflecting seemingly equal service levels, the actual number of buses tells a different story. For instance, even at their busiest, bus stops in Bucks County see an average of only 15 buses per hour at the 5 MAX category, starkly less than the 30 buses seen in similar categories in Philadelphia’s outlying areas. This discrepancy highlights the concentration of better services still within closer proximity to urban centers, despite attempts at equitable distribution.

```{r top 10 stops table, message=FALSE, warning=FALSE, cache=TRUE}
# Filter and select the top 10 stops for 'mode' == 'bus' on the weekday
top_stops_hr_weekday <- dat_trip_count_hr %>%
  filter(mode %in% c("Bus","Trolleybus"), final_day_type == "Weekday") %>%
  select(stop_id, stop_name, avg_trip_hr, All_Routes) %>%
  arrange(desc(avg_trip_hr)) %>%
  slice_head(n = 10)

# Create a table with kable and enhance it with kableExtra
kable(top_stops_hr_weekday, "html", booktabs = TRUE, 
      caption = "Top 10 Bus Stops by Average Trips Per Hour on the Weekday",
      align = c('l', rep('l', ncol(top_stops_hr_weekday) - 1))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE, color = "#655BA6", width = "5em") %>%
  column_spec(3, bold = TRUE, color = "#A6123A", extra_css = "text-align: right;")

```


```{r top 5 stops table, message=FALSE, warning=FALSE, cache=TRUE}
# Define a custom palette for the categories
category_colors <- c(
  "BUCKS" = "#A9D08E",
  "CHESTER" = "#FFD966",
  "DELAWARE" = "#F4B084",
  "MONTGOMERY" = "#D5A6BD",
  "PHILADELPHIA-Not_Center_City" = "#A4C2F4"
)

# Example with Weekday data
dat_filtered_weekday <- dat_trip_count_hr %>%
  filter(mode %in% c("Bus","Trolleybus"), final_day_type == "Weekday" & category != "PHILADELPHIA-Center_City")

# Function to get top 5 stops per category
get_top_stops_per_category <- function(data, n = 5) {
  data %>%
    group_by(category) %>%
    select(stop_id, stop_name, avg_trip_hr, max_freq, All_Routes, category) %>%
    arrange(desc(avg_trip_hr)) %>%
    slice_head(n = n) %>%
    ungroup()
}

# Get top 5 stops per category for Weekday
top_stops_hr_weekday <- get_top_stops_per_category(dat_filtered_weekday, n = 5)

# Create the table for Weekday with row colors
kable(top_stops_hr_weekday, "html", booktabs = TRUE, 
      caption = "Top 5 Bus Stops by Average Trips Per Hour on the Weekday",
      align = c('l', rep('l', ncol(top_stops_hr_weekday) - 1))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE, color = "#655BA6", width = "5em") %>%
  column_spec(3, bold = TRUE, color = "#A6123A", extra_css = "text-align: right;") 
```

### Impact and Innovation: Enhancing Suburban Bus Services

While the raw data illuminates the service disparities, the true depth of this issue is best captured through the lived experiences of suburban commuters like Midge. Personal stories reveal the broader human impact: missed opportunities, prolonged workdays, and diminished family life due to infrequent bus services. This qualitative insight paints a vivid picture of the daily challenges and underscores the urgency for targeted improvements.

SEPTA's ongoing "Bus Revolution" project aims to modernize and enhance bus services, promising updated routes and improved frequencies that haven't been revised since the 1960s. While the initiative seeks to reduce headways to a maximum of 30 minutes even in the most remote suburban areas, it has faced delays and criticism, particularly concerning its potential to disproportionately impact low-income neighborhoods. This tension highlights the complex interplay between improving service efficiency and addressing community concerns of gentrification.

As we conclude our exploration of suburban bus transit disparities in Philadelphia, the compelling narratives of daily commuters, underpinned by robust SEPTA data, paint a clear picture of the challenges and the urgent need for systemic improvements. The "Bus Revolution" project, while a promising step toward enhancing bus frequencies and updating decades-old routes, also highlights the delicate balance required to improve service efficiency without exacerbating social inequities.

Embracing innovative solutions and fostering community engagement in transit planning are crucial. By committing to both technological upgrades and equitable service distribution, SEPTA can ensure that every commuter, regardless of where they live, has reliable and timely access to public transit. This not only improves individual daily experiences but also supports broader goals of economic mobility and sustainable urban development.

As stakeholders, we are called upon to support these initiatives, advocate for transparent and inclusive planning processes, and ensure that the voices of all commuters, especially those most affected like Midge, are heard and addressed in the pursuit of a truly connected and equitable transit system.